I had the pleasure of interviewing Mike Moran about the use of artificial intelligence and machine learning in marketing. We talked about how Mike is really training Artificial Intelligence instead of artificial stupidity, among other things. I hope you’ll give it a listen.
The full transcript is in the YouTube video, but here’s the first question that I asked Mike:
Tim Peter: All right, so let’s start off. Obviously, AI and Machine Learning depend upon data. Can you talk a bit about the ability to leverage AI and how that’s tied to having your data in a good state to start with?
Mike Moran: Sure, yeah, I don’t think there’s anything more important than having your data in a clean state. And what that means for different applications can be very different. If you’re using supervised machine learning where you have a set of training data where you’re creating a model, it’s really important that not only the data itself be easy to import and all in the same format, but also that you’re very confident of the accuracy of your outcome data.
So, for example, if you were trying to predict the weather and the way you were doing that is you were taking all sorts of inputs of barometric pressure and wind speed, and you were trying to look at where the systems are a hundred miles away, and said, you were using that type of data to predict what was going to happen in an hour or three hours. Then, in order to do that you have to have good data that says that you agree on what it is that happens. If something as amorphous as the weather … For some people accuracy means, “How many inches of snow do we get?” For other people it means, “Did we get precipitation at all?” Because predicting snow versus rain is really difficult when it’s 32 degrees out.
And so, for what looks like a really bad miss that someone says, “Man, we got five inches of snow, and they said it was going to be raining,” it was because of two degrees of temperature that was wrong. And so, trying to figure out what your outcome actually is, what are you really trying to predict, that’s really important when you’re putting your data together.
The other thing that’s important is that a lot of times the outcomes you’re trying to predict are actually a stand-in for human judgment. So, think about the Watson application and how it’s trying to do medical diagnoses. What’s really happening here is that you have to be careful that your data is actually representative of the correct answer, not just the most popular answer. So we’re playing Jeopardy rather than Family Feud.
Everyone thinks that’s the right answer. That’s only good if it actually is the right answer. So making sure that you understand that your data was really compiled by experts who agreed with each other rather than just anybody that you gave a survey to, it’s a really important distinction. I think the most important thing for having your data ready is that you know where it is, you know that it is accurate, and you know how to import it into the machine learning environment.
So, the most easy way to do that is if you have data that’s all in a common format. But you can bring data in that’s in multiple formats as long as you really understand that the different fields in the data are all defined the same way, so it’s possible to do that. Having a data dictionary that says, “This is what it is, and we absolutely know how it was compiled and is being compiled the same,” is just as good as if it all came from the same source.
But the real problem, as I said before, it’s the outcome data. You have to know what your outcome is. You have to know that it’s been done in a way, that it’s being tabulated in a way that is actually the same thing you’re trying to predict. And you need to make sure that if it’s based on human expertise, that those are really experts and that there’s maybe multiple of them that agree with each other.